Univariate Stock Predictions: LSTM, ARIMA, and prophet
INFO 523 - Final Project
Project description
Author
Affiliation
Matt Osterhoudt
College of Information Science, University of Arizona
Abstract
Time series models can be used to predict track stock data using historical closing values. Importing stock data from Yahoo Finance, we predict MSFT, SPY, and SWPPX by comparing three models: ARIMA (AutoRegressive Integrated Moving Average), LSTM (Long Short-Term Memory Neural Network), and Facebook Prophet. Each model has its strengths, but the LSTM model proved to be the most effective with a high R-squared value across the three stocks (average of .987). The LSTM model also boasted significantly lower Mean Squared Error and Mean Absolute Error when compared to ARIMA and Prophet. This performance conveys that LSTM may prove more effective for moderately long stock predictions and signifies the power of recurring neural networks.
Introduction/Question
Driving question: Which time series model (ARIMA, LSTM, or Prophet) is best for univariate (daily closing price) stock price prediction? This project aims to identify the most effective model using three different stock datasets. The stock data used will be MSFT (Microsoft), SPY (SPDR S&P 500 ETF Trust), and SWPPX (Schwab U.S. Large-Cap ETF). The data will range from the beginning of 2015 to the end of 2024. The only relevant variables used from the data will be the closing price and the date index. “Close” is the closing price of the stock per day. The date index will also be referred to in yyyy-mm-day format. By developing these models, our objective is to clarify which model can be utilized or recognized when it comes to predicting stock data.
Approach
First, I extracted the stock data using the finance package, an API that retrieves stock data from Yahoo Finance. I extracted daily historical data from 2015 to 2024 and retained the “Close” price series as well as the date index for univariate analysis. I did not deem it necessary to preprocess much of the data. This is because there were no outliers I wanted to remove, nor was there missing data outside of holidays and weekends (this is expected). Feature scaling and normalization were performed within certain time series models if necessary. I selected these time series models to deepen my understanding of time series analysis.
ARIMA Approach & Analysis
First, I chose ARIMA (AutoRegressive Integrated Moving Average) for its ability to capture autocorrelated data and patterns after differencing is applied. To help prepare the data, ARIMA requires stationarity tests, which are applied using the ADF (Augmented Dickey-Fuller) test to test raw vs differenced data. If the p-value is less than the significance level (0.05), we may reject the null, implying that the time series data is stationary. If higher than the significant value, we must find the order of differencing. ARIMA also requires ACF(autocorrelation) and PACF(partial autocorrelation) analysis for P and Q lag selections. ACF and PACF are plotted graphically to aid P and Q selection. Determining how ARIMA is configured with these parameters (p, d, q) helps properly fit the model. Each stock’s data was split into 80/20 for training and testing. Predicted values were graphically overlaid on the actual test data for visual inspection and confirmation. The ARIMA model did not perform as well as I expected. Despite the P, D, and Q selection tests, my model fell quite short. As seen with all three stocks, it plotted a very linear “prediction” that seems to have simply taken an average. This tells me that the model failed to have any meaningful predictions, or that seasonality played an unexpected role. Limitations or future implementations: address this by utilizing SARIMA or autoarima. I can also do a more thorough check of my model for anything extraneous.
Next, I selected LSTM for its capability of modeling sequential data (in this case, time series stock). LSTM does not use linear modeling; instead, it learns patterns within its observed cell states. In this model, I normalized the data for efficiency and to prevent scaling issues. I used TensorFlow and devised a predictive pattern per 60 days. In simple terms, the LSTM model uses the previous 60 days to predict a single day’s stock price. This is iterated over the entire stock’s data. The sequential model is then constructed with 50 units of internal memory cells and 8 neurons using Rectified Linear Unit (ReLU), a neural network function. The model is trained on a batch size of 32 samples and is passed through the training set 20 times (epoch of 20). I am not (yet) particularly well-versed in neural network machine learning. Many of the variable number selections (20 epochs, 32 samples, 50 units of internal memory cells, etc.) are fundamental values I selected based on conventional practice. Because this LSTM model is sequenced several times, I also included a Model Checkpoint that will keep the best-performing model. The data was also partitioned into 80/20 for training and testing purposes. The models seen here are running very well. As seen visually, the predicted values are very closely aligned with the actual values. I believe that the LSTM models did a much more thorough job based on its iterative function.
Finally, Prophet is the last model I am implementing. Prophet expects a data frame that consists of two columns: commonly known as “ds” (timestamps, or in other words, date) and “y” (target variable, in my case, daily closing prices). For the most part, my data is already in this format. The only thing I had to implement later was stripping the timezone. Unlike LSTM, this model is linear and is an additive model. Seasonality is a specific feature that this model anticipates. Our stock data is computed daily over around 9 years, so I select the yearly seasonality to be true and the weekly and daily to be false. Another feature of this model is the setting “freq = b”, which removes weekends and holidays. The performance of this model was interesting. Prophet’s predictive power seemed to be only potent for trends it notices. For example, in the graphs shown for SWPPX and SPY, I partitioned the data into 80/20. The location of the partitioned training/test set happens to be where the tail end of a decrease in stock price occurred. The model saw the decreasing trend and continued to predict that it would decrease. However, for MSFT, I changed the set to 90/10, in which some of the training data included the new upward trend past 2023. The MSFT model instead projected more upwards, indicating that the model relied more on short-term trends to predict values. In addition to this, Prophet includes components that can be plotted. I plotted the trend and yearly movements as well.
Discussion & Model Comparison
Stock
Model
MSE
MAE
R²
NMSE
NMAE
MSFT
Prophet
4591.333
65.959
-12.971
11.021
0.158
MSFT
ARIMA
13585.598
105.661
-2.431
37.437
0.291
MSFT
LSTM
63.961
6.459
0.982
0.018
0.018
SWPPX
Prophet
18.707
3.719
-4.200
1.693
0.337
SWPPX
ARIMA
8.303
2.365
-1.056
0.742
0.211
SWPPX
LSTM
0.033
0.132
0.992
0.008
0.012
SPY
Prophet
21337.151
122.298
-4.163
45.381
0.260
SPY
ARIMA
10072.653
84.816
-1.219
21.212
0.179
SPY
LSTM
63.426
6.268
0.986
0.014
0.013
Conclusion & Limitations
Overall, LSTM was the best-performing model by far. Based on the trend of stock data it was receiving, it was able to accurately map temporal dependencies. LSTM, however, did take the longest to compute. This is something that should be considered. I did not specifically dive into the time used of each model, but LSTM was the longest by far. I think that there are certainly improvements that can be made. For example, with ARIMA or Prophet, perhaps testing month to month data over a single year may produce better results. I could have also used different stock data, and my models were limited to only three. For ARIMA, I specifically could have referenced or used SARIMA or AutoArima to test for the best parameters. I did implement the P, Q, and D test myself, and could have gone amiss there. I only referenced univariate analysis as well. I examined the closing price and nothing else. While this is most likely a very big factor, there could be a myriad of other factors at play as well. Perhaps a multivariate analysis that incorporates more features would yield more interesting results.
Source Code
---title: "Univariate Stock Predictions: LSTM, ARIMA, and prophet"subtitle: "INFO 523 - Final Project"author: - name: "Matt Osterhoudt" affiliations: - name: "College of Information Science, University of Arizona"description: "Project description"format: html: code-tools: true code-overflow: wrap embed-resources: trueeditor: visualexecute: warning: false echo: falsejupyter: python3---## AbstractTime series models can be used to predict track stock data using historical closing values. Importing stock data from Yahoo Finance, we predict MSFT, SPY, and SWPPX by comparing three models: ARIMA (AutoRegressive Integrated Moving Average), LSTM (Long Short-Term Memory Neural Network), and Facebook Prophet. Each model has its strengths, but the LSTM model proved to be the most effective with a high R-squared value across the three stocks (average of .987). The LSTM model also boasted significantly lower Mean Squared Error and Mean Absolute Error when compared to ARIMA and Prophet. This performance conveys that LSTM may prove more effective for moderately long stock predictions and signifies the power of recurring neural networks.## Introduction/QuestionDriving question: Which time series model (ARIMA, LSTM, or Prophet) is best for univariate (daily closing price) stock price prediction? This project aims to identify the most effective model using three different stock datasets. The stock data used will be MSFT (Microsoft), SPY (SPDR S&P 500 ETF Trust), and SWPPX (Schwab U.S. Large-Cap ETF). The data will range from the beginning of 2015 to the end of 2024. The only relevant variables used from the data will be the closing price and the date index. “Close” is the closing price of the stock per day. The date index will also be referred to in yyyy-mm-day format. By developing these models, our objective is to clarify which model can be utilized or recognized when it comes to predicting stock data.## ApproachFirst, I extracted the stock data using the finance package, an API that retrieves stock data from Yahoo Finance. I extracted daily historical data from 2015 to 2024 and retained the “Close” price series as well as the date index for univariate analysis. I did not deem it necessary to preprocess much of the data. This is because there were no outliers I wanted to remove, nor was there missing data outside of holidays and weekends (this is expected). Feature scaling and normalization were performed within certain time series models if necessary. I selected these time series models to deepen my understanding of time series analysis. ```{python}#| label: load-packages#| include: false#| echo: false#| warning: false#| message: false# Load packages hereimport pandas as pdimport numpy as npimport seaborn as snsimport yfinance as yfimport matplotlib.pyplot as pltfrom statsmodels.tsa.stattools import adfullerfrom statsmodels.graphics.tsaplots import plot_acf, plot_pacffrom statsmodels.tsa.arima.model import ARIMAfrom sklearn.preprocessing import MinMaxScalerfrom tensorflow.keras.models import Sequentialfrom tensorflow.keras.layers import*from tensorflow.keras.layers import LSTM, Dense, Dropoutfrom tensorflow.keras.metrics import RootMeanSquaredErrorfrom tensorflow.keras.callbacks import ModelCheckpointfrom tensorflow.keras.models import load_modelimport warningswarnings.filterwarnings('ignore')from prophet import Prophetimport statsmodels.api as smimport timefrom sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score# SPY S&P 500 ETF Stockticker1 = yf.Ticker("SPY")data1 = ticker1.history(period ="max")data1 = data1.loc["2015-01-01":"2024-12-31"]data1.to_csv("data/spy_2015_2024.csv")#print(data1.to_string())spy_close = data1[["Close"]]# Schwab Index Stockticker2 = yf.Ticker("SWPPX")data2 = ticker2.history(period ="max")data2 = data2.loc["2015-01-01":"2024-12-31"]data2.to_csv("data/swppx_2015_2024.csv")#print(data2.to_string())swppx_close = data2[["Close"]]# Microsoft Stock. I wanted to include a consistent large cap stockticker3 = yf.Ticker("MSFT")data3 = ticker3.history(period ="max")data3 = data3.loc["2015-01-01":"2024-12-31"]data3.to_csv("data/msft_2015_2024.csv")#print(data3.to_string())data3.info()msft_close = data3[["Close"]]```## ARIMA Approach & AnalysisFirst, I chose ARIMA (AutoRegressive Integrated Moving Average) for its ability to capture autocorrelated data and patterns after differencing is applied. To help prepare the data, ARIMA requires stationarity tests, which are applied using the ADF (Augmented Dickey-Fuller) test to test raw vs differenced data. If the p-value is less than the significance level (0.05), we may reject the null, implying that the time series data is stationary. If higher than the significant value, we must find the order of differencing. ARIMA also requires ACF(autocorrelation) and PACF(partial autocorrelation) analysis for P and Q lag selections. ACF and PACF are plotted graphically to aid P and Q selection. Determining how ARIMA is configured with these parameters (p, d, q) helps properly fit the model. Each stock's data was split into 80/20 for training and testing. Predicted values were graphically overlaid on the actual test data for visual inspection and confirmation. The ARIMA model did not perform as well as I expected. Despite the P, D, and Q selection tests, my model fell quite short. As seen with all three stocks, it plotted a very linear “prediction” that seems to have simply taken an average. This tells me that the model failed to have any meaningful predictions, or that seasonality played an unexpected role. Limitations or future implementations: address this by utilizing SARIMA or autoarima. I can also do a more thorough check of my model for anything extraneous.```{python}#| echo: false#| warning: false#| message: falsewarnings.filterwarnings('ignore')# ===============================# MSFT ARIMA# ===============================# Setting a function for easier titlingdef add_title(ax, title): ax.set_title(title)fig, ax1 = plt.subplots(figsize=(8, 5))msft_close.plot(ax = ax1)add_title(ax1, "MSFT Closing Price")n =int(len(msft_close) *0.8)train_msft = msft_close.iloc[:n]test_msft = msft_close.iloc[n:]test_msft.info()# Manually check for stationarity# ACF and DACF Testfig, ax2 = plt.subplots(figsize = (8, 5))plot_acf(train_msft, ax = ax2)add_title(ax2, "ACF of MSFT Closing Price")fig, ax3 = plt.subplots(figsize = (8, 5))plot_pacf(train_msft, ax = ax3)add_title(ax3, "PACF of MSFT Closing Price")# ADF Testmsft_adf_test = adfuller(train_msft["Close"])print(f'p-value pre-difference: {msft_adf_test[1]}')# Implement differencingtrain_msft_diff = train_msft.diff().dropna()fig, ax4 = plt.subplots(figsize = (8, 5))train_msft_diff.plot(ax = ax4)add_title(ax4, "Differenced MSFT Closing Price")# PACF/ACF Differenced Plotsfig, ax5 = plt.subplots(figsize=(8, 5))plot_acf(train_msft_diff, ax = ax5)add_title(ax5, "ACF of Differenced MSFT")fig, ax6 = plt.subplots(figsize=(8, 5))plot_pacf(train_msft_diff, ax = ax6)add_title(ax6, "PACF of Differenced MSFT")# Data is now stationary after differencing, based on P-valuemsft_adf_test_diff = adfuller(train_msft_diff["Close"])print(f'p-value post-difference: {msft_adf_test_diff[1]}')msft_arima_model = ARIMA(train_msft["Close"], order = (9, 1, 9), trend ='t')msft_arima_result = msft_arima_model.fit()print(msft_arima_result.summary())msft_forecast_test = msft_arima_result.forecast(len(test_msft)) # Predictionpred_msft = msft_arima_result.predict( start =len(train_msft), end =len(train_msft) +len(test_msft) -1, dynamic =True)# Align predicted index to test dates for plottingpred_msft.index = test_msft.index# Plot actual vs forecastfig, ax = plt.subplots(figsize = (8, 5))# Training datatrain_msft["Close"].plot(ax = ax, label ="Train", color="blue")# Test datatest_msft["Close"].plot(ax = ax, label ="Test", color="green")# Forecasted valuespred_msft.plot(ax = ax, label ="Prediction", color ="red")ax.set_title("MSFT ARIMA Forecast vs Actual")ax.set_xlabel("Date")ax.legend()plt.show()# ===============================# SPY ARIMA# ===============================fig, ax1 = plt.subplots(figsize = (8, 5))spy_close.plot(ax = ax1)add_title(ax1, "SPY Closing Price")n =int(len(spy_close) *0.8)train_spy = spy_close.iloc[:n]test_spy = spy_close.iloc[n:]#test_spy.info()# Manually check for stationarity# ACF and PACF Testfig, ax2 = plt.subplots(figsize = (8, 5))plot_acf(train_spy, ax = ax2)add_title(ax2, "ACF of SPY Closing Price")fig, ax3 = plt.subplots(figsize = (8, 5))plot_pacf(train_spy, ax = ax3)add_title(ax3, "PACF of SPY Closing Price")# ADF Testspy_adf_test = adfuller(train_spy["Close"])print(f'p-value pre-difference: {spy_adf_test[1]}')# Implement differencingtrain_spy_diff = train_spy.diff().dropna()fig, ax4 = plt.subplots(figsize = (8, 5))train_spy_diff.plot(ax = ax4)add_title(ax4, "Differenced SPY Closing Price")# PACF/ACF Differenced Plotsfig, ax5 = plt.subplots(figsize = (8, 5))plot_acf(train_spy_diff, ax = ax5)add_title(ax5, "ACF of Differenced SPY")fig, ax6 = plt.subplots(figsize = (8, 5))plot_pacf(train_spy_diff, ax = ax6)add_title(ax6, "PACF of Differenced SPY")# Data is now stationary after differencing, based on P-valuespy_adf_test_diff = adfuller(train_spy_diff["Close"])print(f'p-value post-difference: {spy_adf_test_diff[1]}')spy_arima_model = ARIMA(train_spy["Close"], order = (9, 1, 6), trend ='t')spy_arima_result = spy_arima_model.fit()print(spy_arima_result.summary())spy_forecast_test = spy_arima_result.forecast(len(test_spy))# Predictionpred_spy = spy_arima_result.predict( start =len(train_spy), end =len(train_spy) +len(test_spy) -1, dynamic =True)# Align predicted index to test dates for plottingpred_spy.index = test_spy.index# Plot actual vs forecastfig, ax = plt.subplots(figsize=(8, 5))train_spy["Close"].plot(ax = ax, label ="Train", color ="blue")test_spy["Close"].plot(ax = ax, label ="Test", color ="green")pred_spy.plot(ax= ax, label ="Prediction", color ="red")ax.set_title("SPY ARIMA Forecast vs Actual")ax.set_xlabel("Date")ax.legend()plt.show()# ===============================# SWPPX ARIMA# ===============================fig, ax1 = plt.subplots(figsize = (8, 5))swppx_close.plot(ax = ax1)add_title(ax1, "SWPPX Closing Price")n =int(len(swppx_close) *0.8)train_swppx = swppx_close.iloc[:n]test_swppx = swppx_close.iloc[n:]#test_swppx.info()# Manually check for stationarity# ACF and PACF Testfig, ax2 = plt.subplots(figsize = (8, 5))plot_acf(train_swppx, ax = ax2)add_title(ax2, "ACF of SWPPX Closing Price")fig, ax3 = plt.subplots(figsize = (8, 5))plot_pacf(train_swppx, ax = ax3)add_title(ax3, "PACF of SWPPX Closing Price")# ADF Testswppx_adf_test = adfuller(train_swppx["Close"])print(f'p-value pre-difference: {swppx_adf_test[1]}')# Implement differencingtrain_swppx_diff = train_swppx.diff().dropna()fig, ax4 = plt.subplots(figsize = (8, 5))train_swppx_diff.plot(ax = ax4)add_title(ax4, "Differenced SWPPX Closing Price")# PACF/ACF Differenced Plotsfig, ax5 = plt.subplots(figsize = (8, 5))plot_acf(train_swppx_diff, ax = ax5)add_title(ax5, "ACF of Differenced SWPPX")fig, ax6 = plt.subplots(figsize = (8, 5))plot_pacf(train_swppx_diff, ax = ax6)add_title(ax6, "PACF of Differenced SWPPX")# Data is now stationary after differencing, based on P-valueswppx_adf_test_diff = adfuller(train_swppx_diff["Close"])print(f'p-value post-difference: {swppx_adf_test_diff[1]}')swppx_arima_model = ARIMA(train_swppx["Close"], order = (2, 1, 2), trend ='t')swppx_arima_result = swppx_arima_model.fit()print(swppx_arima_result.summary())swppx_forecast_test = swppx_arima_result.forecast(len(test_swppx))# Predictionpred_swppx = swppx_arima_result.predict( start =len(train_swppx), end =len(train_swppx) +len(test_swppx) -1, dynamic =True)# Align predicted index to test dates for plottingpred_swppx.index = test_swppx.index# Plot actual vs forecastfig, ax = plt.subplots(figsize = (8, 5))# Training datatrain_swppx["Close"].plot(ax = ax, label ="Train", color ="blue")# Test datatest_swppx["Close"].plot(ax = ax, label ="Test", color ="green")# Forecasted valuespred_swppx.plot(ax = ax, label ="Prediction", color ="red")ax.set_title("SWPPX ARIMA Forecast vs Actual")ax.set_xlabel("Date")ax.legend()plt.show()```## LSTM Approach & AnalysisNext, I selected LSTM for its capability of modeling sequential data (in this case, time series stock). LSTM does not use linear modeling; instead, it learns patterns within its observed cell states. In this model, I normalized the data for efficiency and to prevent scaling issues. I used TensorFlow and devised a predictive pattern per 60 days. In simple terms, the LSTM model uses the previous 60 days to predict a single day’s stock price. This is iterated over the entire stock’s data. The sequential model is then constructed with 50 units of internal memory cells and 8 neurons using Rectified Linear Unit (ReLU), a neural network function. The model is trained on a batch size of 32 samples and is passed through the training set 20 times (epoch of 20). I am not (yet) particularly well-versed in neural network machine learning. Many of the variable number selections (20 epochs, 32 samples, 50 units of internal memory cells, etc.) are fundamental values I selected based on conventional practice. Because this LSTM model is sequenced several times, I also included a Model Checkpoint that will keep the best-performing model. The data was also partitioned into 80/20 for training and testing purposes. The models seen here are running very well. As seen visually, the predicted values are very closely aligned with the actual values. I believe that the LSTM models did a much more thorough job based on its iterative function.```{python}#| echo: false#| warning: false#| message: falsewarnings.filterwarnings('ignore')# ===============================# MSFT LSTM# ===============================msft_close_data = msft_close.values # Taking shape# This time, I am scaling the datascaler = MinMaxScaler(feature_range=(0, 1))scaled_msft_close = scaler.fit_transform(msft_close_data)# This function takes the data by intervals of 60, and converts it to supervised learning examples for later.def create_sequences(data, interval =60): X = [] y = []for i inrange(interval, len(data)): X.append(data[i-interval:i, 0]) y.append(data[i, 0])return np.array(X), np.array(y)interval =60# days of history per sampleX_all, y_all = create_sequences(scaled_msft_close, interval)# Double checking the outputX_all.view()y_all.view()# Usual 80/20 splitn_train =int(len(X_all) *0.8)X_train, X_test = X_all[:n_train], X_all[n_train:]y_train, y_test = y_all[:n_train], y_all[n_train:]# LSTM Tensorflow needs input in [samples, timesteps, features] form, so this code converts our 2d input into 3d. The final "1" is the new features dimension.X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))# Building model with parametersmsft_lstm_model = Sequential()msft_lstm_model.add(InputLayer(input_shape = (interval, 1)))# Via online research as to what units to use msft_lstm_model.add(LSTM(50))msft_lstm_model.add(Dropout(0.2))msft_lstm_model.add(Dense(8, "relu"))msft_lstm_model.add(Dense(1, "linear"))msft_lstm_model.summary()# This code will save the best modelmsft_cp = ModelCheckpoint("msft_lstm_model/.keras", save_best_only =True)# Builds model and tracks MSE, RMSE, and default value lossmsft_lstm_model.compile(optimizer ="adam", loss ="mean_squared_error", metrics = [RootMeanSquaredError()])# Training model with 32 sequences and 20 epoch parameter.msft_lstm_model.fit(X_train, y_train, batch_size =32, epochs =20, validation_data = (X_test, y_test), callbacks = [msft_cp], verbose =0)# Loads best model for our plot latermsft_lstm_model = load_model("msft_lstm_model/.keras")msft_predictions = msft_lstm_model.predict(X_test)# Undoing the scalermsft_predictions = scaler.inverse_transform(msft_predictions.reshape(-1, 1))y_test_actual = scaler.inverse_transform(y_test.reshape(-1, 1))# MSFT LSTM Plotmsft_train_dates = msft_close.index[interval:n_train + interval]msft_test_dates = msft_close.index[n_train + interval:]fig, ax = plt.subplots(figsize = (8,5))ax.plot(msft_test_dates, y_test_actual, label ="Actual", color ="green")ax.plot(msft_test_dates, msft_predictions, label ="Predicted", color ="red")ax.set_title("MSFT LSTM Prediction vs Actual")ax.set_xlabel("Date")ax.set_ylabel("Price")ax.legend()plt.show()# ===============================# SPY LSTM# ===============================spy_close_data = spy_close.values # Taking shapescaler = MinMaxScaler(feature_range = (0, 1))scaled_spy_close = scaler.fit_transform(spy_close_data)X_all, y_all = create_sequences(scaled_spy_close, interval =60)X_all.view()y_all.view()n_train =int(len(X_all) *0.8)X_train, X_test = X_all[:n_train], X_all[n_train:]y_train, y_test = y_all[:n_train], y_all[n_train:]X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))spy_lstm_model = Sequential()spy_lstm_model.add(InputLayer(input_shape = (interval, 1)))spy_lstm_model.add(LSTM(50))spy_lstm_model.add(Dropout(0.2))spy_lstm_model.add(Dense(8, "relu"))spy_lstm_model.add(Dense(1, "linear"))spy_lstm_model.summary()spy_cp = ModelCheckpoint("spy_lstm_model/.keras", save_best_only =True)spy_lstm_model.compile(optimizer ="adam", loss ="mean_squared_error", metrics = [RootMeanSquaredError()])spy_lstm_model.fit(X_train, y_train, batch_size=32, epochs =20, validation_data = (X_test, y_test), callbacks = [spy_cp], verbose =0)spy_lstm_model = load_model("spy_lstm_model/.keras")spy_predictions = spy_lstm_model.predict(X_test)spy_predictions = scaler.inverse_transform(spy_predictions.reshape(-1, 1))spy_y_test_actual = scaler.inverse_transform(y_test.reshape(-1, 1))spy_train_dates = spy_close.index[interval:n_train + interval]spy_test_dates = spy_close.index[n_train + interval:]fig, ax = plt.subplots(figsize = (8,5))ax.plot(spy_test_dates, spy_y_test_actual, label ="Actual", color ="green")ax.plot(spy_test_dates, spy_predictions, label ="Predicted", color ="red")ax.set_title("SPY LSTM Prediction vs Actual")ax.set_xlabel("Date")ax.set_ylabel("Price")ax.legend()plt.show()test_spy_lstm = pd.DataFrame({"Date": spy_test_dates,"Actual": spy_y_test_actual.flatten(),"Predicted": spy_predictions.flatten()})# ===============================# SWPPX LSTM# ===============================swppx_close_data = swppx_close.values # Taking shapescaler = MinMaxScaler(feature_range = (0, 1))scaled_swppx_close = scaler.fit_transform(swppx_close_data)X_all, y_all = create_sequences(scaled_swppx_close, interval =60)X_all.view()y_all.view()n_train =int(len(X_all) *0.8)X_train, X_test = X_all[:n_train], X_all[n_train:]y_train, y_test = y_all[:n_train], y_all[n_train:]X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))swppx_lstm_model = Sequential()swppx_lstm_model.add(InputLayer(input_shape = (interval, 1)))swppx_lstm_model.add(LSTM(50))swppx_lstm_model.add(Dropout(0.2))swppx_lstm_model.add(Dense(8, "relu"))swppx_lstm_model.add(Dense(1, "linear"))swppx_lstm_model.summary()swppx_cp = ModelCheckpoint("swppx_lstm_model/.keras", save_best_only =True)swppx_lstm_model.compile(optimizer ="adam", loss ="mean_squared_error", metrics = [RootMeanSquaredError()])swppx_lstm_model.fit(X_train, y_train, batch_size =32, epochs =20, validation_data = (X_test, y_test), callbacks = [swppx_cp], verbose =0)swppx_lstm_model = load_model("swppx_lstm_model/.keras")swppx_predictions = swppx_lstm_model.predict(X_test)swppx_predictions = scaler.inverse_transform(swppx_predictions.reshape(-1, 1))swppx_y_test_actual = scaler.inverse_transform(y_test.reshape(-1, 1))swppx_train_dates = swppx_close.index[interval:n_train + interval]swppx_test_dates = swppx_close.index[n_train + interval:]fig, ax = plt.subplots(figsize = (8,5))ax.plot(swppx_test_dates, swppx_y_test_actual, label ="Actual", color ="green")ax.plot(swppx_test_dates, swppx_predictions, label ="Predicted", color ="red")ax.set_title("SWPPX LSTM Prediction vs Actual")ax.set_xlabel("Date")ax.set_ylabel("Price")ax.legend()plt.show()test_swppx_lstm = pd.DataFrame({"Date": swppx_test_dates,"Actual": swppx_y_test_actual.flatten(),"Predicted": swppx_predictions.flatten()})```## Prophet Approach & AnalysisFinally, Prophet is the last model I am implementing. Prophet expects a data frame that consists of two columns: commonly known as “ds” (timestamps, or in other words, date) and “y” (target variable, in my case, daily closing prices). For the most part, my data is already in this format. The only thing I had to implement later was stripping the timezone. Unlike LSTM, this model is linear and is an additive model. Seasonality is a specific feature that this model anticipates. Our stock data is computed daily over around 9 years, so I select the yearly seasonality to be true and the weekly and daily to be false. Another feature of this model is the setting “freq = b”, which removes weekends and holidays. The performance of this model was interesting. Prophet’s predictive power seemed to be only potent for trends it notices. For example, in the graphs shown for SWPPX and SPY, I partitioned the data into 80/20. The location of the partitioned training/test set happens to be where the tail end of a decrease in stock price occurred. The model saw the decreasing trend and continued to predict that it would decrease. However, for MSFT, I changed the set to 90/10, in which some of the training data included the new upward trend past 2023. The MSFT model instead projected more upwards, indicating that the model relied more on short-term trends to predict values. In addition to this, Prophet includes components that can be plotted. I plotted the trend and yearly movements as well.```{python}#| include: false#| echo: false#| warning: false#| message: falsewarnings.simplefilter("ignore")# ============================# MSFT Prophet Forecast# ============================# Based on research, Prophet requires a dataframe with two columns: ds (datestamp) and y (numeric forecast measurement). Common naming conventions appear to be ds and y so I will keep that consistent.msft_prophet_data = msft_close.reset_index()# Ran into an error with the datetime format. Timezone must be removed.msft_prophet_data["Date"] = pd.to_datetime(msft_prophet_data["Date"]).dt.tz_localize(None)msft_prophet_data = msft_prophet_data[["Date", "Close"]].rename(columns = {"Date": "ds", "Close": "y"})msft_prophet_data.head()# Split to training and test againn =int(len(msft_prophet_data) *0.9)train_msft_prophet = msft_prophet_data.iloc[:n]test_msft_prophet = msft_prophet_data.iloc[n:]# Setting the model parametersmsft_proph_model = Prophet( growth ="linear", yearly_seasonality =True, weekly_seasonality =False, daily_seasonality =False )msft_proph_model.fit(train_msft_prophet) # Dataframe for the forecastingfuture_msft_values = msft_proph_model.make_future_dataframe(periods =len(test_msft_prophet), freq ="B") # No weekendsforecast_msft = msft_proph_model.predict(future_msft_values)forecast_msft.head()# Plotting the Prophet Modelfig, ax = plt.subplots(figsize = (8, 5))ax.plot(train_msft_prophet["ds"], train_msft_prophet["y"], label ="Train", color ="blue")ax.plot(test_msft_prophet["ds"], test_msft_prophet["y"], label ="Test", color ="green")ax.plot(forecast_msft["ds"], forecast_msft["yhat"], label ="Prophet Prediction", color ="red")ax.set_title("MSFT Prophet Forecast vs Actual")ax.set_xlabel("Date")ax.legend()plt.show()fig = msft_proph_model.plot_components(forecast_msft)fig.suptitle("MSFT Prophet Components", y =1.02)plt.show()# ============================# SWPPX Prophet Forecast# ============================swppx_prophet_data = swppx_close.reset_index()# Ran into an error with the datetime format. Timezone must be removed.swppx_prophet_data["Date"] = pd.to_datetime(swppx_prophet_data["Date"]).dt.tz_localize(None)swppx_prophet_data = swppx_prophet_data[["Date", "Close"]].rename(columns = {"Date": "ds", "Close": "y"})swppx_prophet_data.head()# Split to training and test againn =int(len(swppx_prophet_data) *0.8)train_swppx_prophet = swppx_prophet_data.iloc[:n]test_swppx_prophet = swppx_prophet_data.iloc[n:]# Setting the model parametersswppx_proph_model = Prophet( growth ="linear", yearly_seasonality =True, weekly_seasonality =False, daily_seasonality =False )swppx_proph_model.fit(train_swppx_prophet) # Dataframe for the forecastingfuture_swppx_values = swppx_proph_model.make_future_dataframe(periods =len(test_swppx_prophet), freq ="B") # No weekendsforecast_swppx = swppx_proph_model.predict(future_swppx_values)forecast_swppx.head()# Plotting the Prophet Modelfig, ax = plt.subplots(figsize = (8, 5))ax.plot(train_swppx_prophet["ds"], train_swppx_prophet["y"], label ="Train", color ="blue")ax.plot(test_swppx_prophet["ds"], test_swppx_prophet["y"], label ="Test", color ="green")ax.plot(forecast_swppx["ds"], forecast_swppx["yhat"], label ="Prophet Prediction", color ="red")ax.set_title("SWPPX Prophet Forecast vs Actual")ax.set_xlabel("Date")ax.legend()plt.show()fig = swppx_proph_model.plot_components(forecast_swppx)fig.suptitle("SWPPX Prophet Components", y =1.02)plt.show()# ============================# SPY Prophet Forecast# ============================spy_prophet_data = spy_close.reset_index()# Ran into an error with the datetime format. Timezone must be removed.spy_prophet_data["Date"] = pd.to_datetime(spy_prophet_data["Date"]).dt.tz_localize(None)spy_prophet_data = spy_prophet_data[["Date", "Close"]].rename(columns = {"Date": "ds", "Close": "y"})spy_prophet_data.head()# Split to training and test againn =int(len(spy_prophet_data) *0.8)train_spy_prophet = spy_prophet_data.iloc[:n]test_spy_prophet = spy_prophet_data.iloc[n:]# Setting the model parametersspy_proph_model = Prophet( growth ="linear", yearly_seasonality =True, weekly_seasonality =False, daily_seasonality =False )spy_proph_model.fit(train_spy_prophet) # Dataframe for the forecastingfuture_spy_values = spy_proph_model.make_future_dataframe(periods =len(test_spy_prophet), freq ="B") # No weekendsforecast_spy = spy_proph_model.predict(future_spy_values)forecast_spy.head()# Plotting the Prophet Modelfig, ax = plt.subplots(figsize = (8, 5))ax.plot(train_spy_prophet["ds"], train_spy_prophet["y"], label ="Train", color ="blue")ax.plot(test_spy_prophet["ds"], test_spy_prophet["y"], label ="Test", color ="green")ax.plot(forecast_spy["ds"], forecast_spy["yhat"], label ="Prophet Prediction", color ="red")ax.set_title("SPY Prophet Forecast vs Actual")ax.set_xlabel("Date")ax.legend()plt.show()fig = spy_proph_model.plot_components(forecast_spy)fig.suptitle("SPY Prophet Components", y =1.02)plt.show()```## Discussion & Model Comparison```{python}# Model Comparison# Defining a function to check the values. It was necessary to do a merge because of the inconsistent sizes.def eval_prophet(test_df, forecast_df, lab): merged = pd.merge( test_df[["ds", "y"]], forecast_df[["ds", "yhat"]], on ="ds", how ="inner" ) y_true = merged["y"].values y_pred = merged["yhat"].values mean_y = np.mean(y_true) mse = mean_squared_error(y_true, y_pred) mae = mean_absolute_error(y_true, y_pred) r2 = r2_score(y_true, y_pred) nmse = mse / mean_y nmae = mae / mean_yprint(f"Prophet {lab}: "f"MSE: {mse:} | MAE: {mae:} | R²: {r2:} | "f"NMSE: {nmse:} | NMAE: {nmae:}")#eval_prophet(test_msft_prophet, forecast_msft, "MSFT")#eval_prophet(test_swppx_prophet, forecast_swppx, "SWPPX")#eval_prophet(test_spy_prophet, forecast_spy, "SPY")# Function for ARIMA testingdef eval_arima(test_df, forecast_df, lab):# Ensuring consistency across the forecasting models. Forecast is merged with the test set, and the models may store dates in different ways. This logic ensures that the alignment is consistent across the functions.ifisinstance(forecast_df, pd.Series): forecast_df = forecast_df.to_frame(name ="yhat")else: forecast_df = forecast_df.rename(columns = {forecast_df.columns[0]: "yhat"}) merged = pd.merge( test_df.reset_index()[["Date", "Close"]], forecast_df.reset_index()[["Date", "yhat"]], on ="Date", how ="inner" ) y_true = merged["Close"].values y_pred = merged["yhat"].values mean_y = np.mean(y_true) mse = mean_squared_error(y_true, y_pred) mae = mean_absolute_error(y_true, y_pred) r2 = r2_score(y_true, y_pred) nmse = mse / mean_y nmae = mae / mean_yprint(f"ARIMA {lab}: "f"MSE: {mse:} | MAE: {mae:} | R²: {r2:} | "f"NMSE: {nmse:} | NMAE: {nmae:}")#eval_arima(test_msft, pred_msft, "MSFT")#eval_arima(test_swppx, pred_swppx, "SWPPX")#eval_arima(test_spy, pred_spy, "SPY")# I did this function a little differently because I was running into issues... def eval_lstm(y_true, y_pred, lab):# Flatten to 1D y_true = y_true.flatten() y_pred = y_pred.flatten() mse = mean_squared_error(y_true, y_pred) mae = mean_absolute_error(y_true, y_pred) r2 = r2_score(y_true, y_pred) nmse = mse / np.var(y_true) # variance-based nmae = mae / np.mean(y_true) # mean-basedprint(f"LSTM {lab}: "f"MSE: {mse:} | MAE: {mae:} | R²: {r2:} | "f"NMSE: {nmse:} | NMAE: {nmae:}")#eval_lstm(y_test_actual, msft_predictions, "MSFT")#eval_lstm(swppx_y_test_actual, swppx_predictions, "SWPPX")#eval_lstm(spy_y_test_actual, spy_predictions, "SPY")```| Stock | Model | MSE | MAE | R² | NMSE | NMAE ||---------|---------|------------|----------|-----------|---------|---------|| MSFT | Prophet | 4591.333 | 65.959 | -12.971 | 11.021 | 0.158 || MSFT | ARIMA | 13585.598 | 105.661 | -2.431 | 37.437 | 0.291 || MSFT | LSTM | 63.961 | 6.459 | 0.982 | 0.018 | 0.018 || SWPPX | Prophet | 18.707 | 3.719 | -4.200 | 1.693 | 0.337 || SWPPX | ARIMA | 8.303 | 2.365 | -1.056 | 0.742 | 0.211 || SWPPX | LSTM | 0.033 | 0.132 | 0.992 | 0.008 | 0.012 || SPY | Prophet | 21337.151 | 122.298 | -4.163 | 45.381 | 0.260 || SPY | ARIMA | 10072.653 | 84.816 | -1.219 | 21.212 | 0.179 || SPY | LSTM | 63.426 | 6.268 | 0.986 | 0.014 | 0.013 |## Conclusion & LimitationsOverall, LSTM was the best-performing model by far. Based on the trend of stock data it was receiving, it was able to accurately map temporal dependencies. LSTM, however, did take the longest to compute. This is something that should be considered. I did not specifically dive into the time used of each model, but LSTM was the longest by far. I think that there are certainly improvements that can be made. For example, with ARIMA or Prophet, perhaps testing month to month data over a single year may produce better results. I could have also used different stock data, and my models were limited to only three. For ARIMA, I specifically could have referenced or used SARIMA or AutoArima to test for the best parameters. I did implement the P, Q, and D test myself, and could have gone amiss there. I only referenced univariate analysis as well. I examined the closing price and nothing else. While this is most likely a very big factor, there could be a myriad of other factors at play as well. Perhaps a multivariate analysis that incorporates more features would yield more interesting results.